Hive Query Language(HQL)

HQL is a simple SQL-like query language that is used to manage or query large datasets for enterprises working on voluminous data almost every day. This is easy to work with HQL if you know how to use SQL. The experience Hive programmers having hands-on experiences in HQL can write custom MapReduce functions to perform data analysis more sophistically.
  • Apache Hive framework is responsible for distributed storage.
  • Hive offers a complete range of tool to enable quick data ETL (Extract/Transform/Load).
  • IT has the capability to design structure of various data formats.
  • With Hive, you are free to access data from various other Hadoop frameworks like HDFS or HBase etc.

Why does Hive not store metadata information in HDFS?

We know that the Hive’s data is stored in HDFS. However, the metadata is either stored locally or it is stored in RDBMS. The metadata is not stored in HDFS, because HDFS read/write operations are time-consuming. As such, Hive stores metadata information in the metastore using RDBMS instead of HDFS. This allows us to achieve low latency and is faster.


Important commands
set hive.cli.print.header=true;

How to set execution engine in Hive
set hive.execution.engine=tez;
set hive.execution.engine=spark;
set hive.execution.engine=mr; 

Why Hive does not store metadata information in HDFS?

Hive stores metadata information in the metastore using RDBMS instead of HDFS. The reason for choosing RDBMS is to achieve low latency as HDFS read/write operations are time consuming processes.

create external table emp
(
   empno int,
   ename varchar(20),
   job varchar(20),
   mgr int,
   hiredate string,
   sal float,
   comm float,
   deptno int
)ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n' STORED AS TEXTFILE

No comments:

Post a Comment